refactor: Endpoint class as a single entrypoint uniting `@remote` and ServerlessResource-based classes by KAJdev · Pull Request #223 · runpod/flash

KAJdev · 2026-02-25T23:15:14Z

Unified Endpoint API

Replaces 8 resource config classes (LiveServerless, CpuLiveServerless, LiveLoadBalancer, CpuLiveLoadBalancer, ServerlessEndpoint, CpuServerlessEndpoint, LoadBalancerSlsResource, CpuLoadBalancerSlsResource) and the @remote decorator with a single Endpoint class.

Fixes AE-2259
Fixes AE-2306

Queue-based

  @Endpoint(name="worker", gpu=GpuType.ANY, dependencies=["torch"])
  async def predict(input_data: dict) -> dict:
      ...

Load-balanced

  api = Endpoint(name="service", cpu="cpu3c-1-2", workers=(1, 3))

  @api.post("/predict")
  async def predict(data: dict) -> dict:
      ...

Client mode

  ep = Endpoint(id="ep-abc123")
  job = await ep.run({"prompt": "hello"})
  await job.wait()
  print(job.output)

What changed

Endpoint is a facade that internally creates the old resource config objects, so the existing deployment/provisioning/handler pipeline continues working unchanged
QB vs LB is inferred from usage pattern (decorator vs route registration)
GPU vs CPU is a parameter (gpu= / cpu=), not a class choice
EndpointJob wraps job responses with status(), wait(), cancel(), and property access (job.id, job.output, job.error, job.done)
Scanner, manifest builder, and resource discovery all recognize Endpoint patterns
Legacy classes and @Remote emit DeprecationWarning on import/use
Skeleton templates (flash init) generate the new API

…templates

runpod-Henrik · 2026-02-26T22:18:41Z

pulled this down to verify with my examples a few notes:

ServerlessScalerType not exposed

04_scaling_performance/01_autoscaling/gpu_worker.py configures scaling strategies:

scale_to_zero_config = LiveServerless(
name="04_01_scale_to_zero",
gpus=[GpuGroup.ANY],
workersMin=0, workersMax=3, idleTimeout=5,
scalerType=ServerlessScalerType.QUEUE_DELAY,
scalerValue=4,
)

This controls how autoscaling decides to add workers — QUEUE_DELAY scales based on how long jobs wait in queue,
REQUEST_COUNT scales based on pending request volume. The example shows three strategies side by side (scale-to-zero,
always-on, high-throughput) with different scalerType/scalerValue combos.

Endpoint() doesn't have these params, so there's no way to express this:

What we'd want:

@endpoint(name="worker", gpu=GpuGroup.ANY, workers=(0, 3),
scaler_type=ServerlessScalerType.QUEUE_DELAY, scaler_value=4)
async def scale_to_zero_inference(payload: dict) -> dict: ...

Could we add scaler_type / scaler_value (or a combined scaler= param)?

PodTemplate features not surfaced
(new example not checked in yet)
03_advanced_workers/04_custom_images/gpu_worker.py uses PodTemplate for full Docker control:

template = PodTemplate(
name="03_04_custom_template",
imageName="runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04",
containerDiskInGb=30,
dockerArgs="--shm-size=2g",
startScript="echo 'Worker starting with custom image'",
ports="8080/http",
# containerRegistryAuthId="your-auth-id", # for private registries
)

gpu_config = ServerlessEndpoint(
name="03_04_custom_images",
gpus=[GpuGroup.ADA_24],
template=template,
workersMin=0, workersMax=2, idleTimeout=5,
)

Endpoint(image=) only takes the image name string. The other template features — dockerArgs (e.g. shared memory size),
startScript (pre-run setup), ports, containerDiskInGb, and containerRegistryAuthId (private registries) — have no
equivalent. These are important for real-world deployments where the default Flash image doesn't work (custom CUDA
versions, private model servers, etc.).

Could we either add a template= param that accepts a PodTemplate, or surface these as top-level kwargs on Endpoint?

Class-based @endpoint?

Two examples use @Remote on a class for stateful workers. Here's the pattern from
05_data_workflows/01_network_volumes/gpu_worker.py:

@Remote(resource_config=gpu_config, dependencies=["diffusers", "torch", "transformers"])
class SimpleSD:
def init(self):
# Runs once at worker startup — loads 4GB model into GPU memory
self.pipe = StableDiffusionPipeline.from_pretrained(...)
self.pipe = self.pipe.to("cuda")

  async def generate_image(self, prompt: str) -> dict:
      # Uses self.pipe — already warm in GPU memory
      image = self.pipe(prompt=prompt, ...).images[0]
      return {"image_path": image_path}

The class is instantiated once when the worker boots. The model stays in GPU memory via self.pipe and every request
calls methods on the same instance — no re-loading a 4GB model per request.

With function-based @endpoint, there's no self to hold state:

@endpoint(name="worker", gpu=GpuGroup.ANY, dependencies=["diffusers", "torch"])
async def generate_image(prompt: str) -> dict:
# Re-loading a 4GB model on every request — 30+ seconds of overhead each time
pipe = StableDiffusionPipeline.from_pretrained(...)
pipe = pipe.to("cuda")
image = pipe(prompt=prompt, ...).images[0]
return {"image_path": image_path}

Does @endpoint(...) support decorating classes the same way @Remote does? If not, we'd need a workaround (module-level
global with lazy init) or keep these on the legacy API.

GpuGroup vs GpuType

The PR's skeleton templates use GpuType:

@endpoint(name="gpu_worker", gpu=GpuType.ANY, dependencies=["torch"])
async def gpu_hello(input_data: dict) -> dict: ...

But existing examples all use GpuGroup:

@endpoint(name="worker", gpu=GpuGroup.ADA_24)
async def my_func(payload: dict) -> dict: ...

Both work — Endpoint(gpu=) accepts either. But they mean different things: GpuType is a specific GPU model (e.g. RTX
4090), GpuGroup is a family (e.g. all Ada 24GB cards: 4090, L4, etc.). For examples, which should we standardize on?
Current thinking:

GpuGroup for "give me any GPU in this tier" (most examples)
GpuType only for the GPU selection example that targets a specific card

KAJdev · 2026-02-26T23:06:25Z

This controls how autoscaling decides to add workers — QUEUE_DELAY scales based on how long jobs wait in queue,
REQUEST_COUNT scales based on pending request volume. The example shows three strategies side by side (scale-to-zero,
always-on, high-throughput) with different scalerType/scalerValue combos.

Endpoint() doesn't have these params, so there's no way to express this:

will work on adding those parameters

Endpoint(image=) only takes the image name string. The other template features — dockerArgs (e.g. shared memory size),
startScript (pre-run setup), ports, containerDiskInGb, and containerRegistryAuthId (private registries) — have no
equivalent. These are important for real-world deployments where the default Flash image doesn't work (custom CUDA
versions, private model servers, etc.).

Could we either add a template= param that accepts a PodTemplate, or surface these as top-level kwargs on Endpoint?

👍

Does https://github.com/endpoint(...) support decorating classes the same way https://github.com/Remote does? If not, we'd need a workaround (module-level
global with lazy init) or keep these on the legacy API.

Endpoint does support classes

Both work — Endpoint(gpu=) accepts either. But they mean different things: GpuType is a specific GPU model (e.g. RTX
4090), GpuGroup is a family (e.g. all Ada 24GB cards: 4090, L4, etc.). For examples, which should we standardize on?

we should prefer GpuType in simpler examples, since it is easier to understand, but expand to GpuGroup for situations when more scale is important

runpod-Henrik · 2026-02-27T21:14:57Z

QA Report

Status: WARN
PR: #223 — feat: single entrypoint
Agent: flash-qa (PR mode)

CI Status

All 6 Quality Gates pass (Python 3.10–3.14 + Build Package). No CI regressions detected.

Note: Unable to run local tests — worktree branch checkout was blocked by sandbox policy. All analysis below is from static diff review and CI results.

PR Scope

13 source files changed/added (695-line endpoint.py is new)
9 test files added with 161 test methods
Key changes: new Endpoint class, deprecation warnings on legacy classes/remote, scanner + discovery + manifest + provisioner updates for Endpoint patterns

Test File Summary

Test File	Tests	Coverage Area
`test_endpoint.py`	~55	Endpoint construction, init params, QB/LB decorators, resource config type matrix (2x2x2), caching
`test_endpoint_client.py`	~40	EndpointJob lifecycle, run/runsync/cancel, _ensure_endpoint_ready (id + image modes), LB client requests, end-to-end flows
`test_deprecations.py`	~20	Deprecation warnings for 8 legacy classes + remote decorator, non-deprecated names verified
`test_discovery_endpoint.py`	~10	ResourceDiscovery with Endpoint LB patterns, resolve, directory scan, mixed legacy+Endpoint
`test_skeleton_endpoint.py`	4	Skeleton templates use Endpoint API (gpu/cpu/lb workers + README)
`test_scanner_endpoint.py`	~20	Scanner: QB functions/classes, LB routes, all HTTP methods, mixed patterns, edge cases
`test_manifest_endpoint.py`	~6	Manifest building with Endpoint QB/LB metadata, deployment config extraction with unwrapping
`test_run_endpoint.py`	~7	flash run: scan + server generation for QB/LB/mixed Endpoint patterns
`test_resource_provisioner.py`	+4	Endpoint resource_type resolution to correct internal classes (4 combinations)

PR Diff Analysis

No bare exceptions
No hardcoded secrets (RUNPOD_API_KEY properly popped from env dict)
No print() in library source (print() only in skeleton if __name__ == "__main__" blocks and README examples — acceptable)
Public API surface changes documented: Endpoint and EndpointJob added to __all__, TYPE_CHECKING imports updated
Deprecation warnings added with stacklevel=2 for correct caller attribution
_internal=True flag on remote() suppresses double-warnings when called from Endpoint internals
Resource config caching prevents redundant provisioning
remote() deprecation is a breaking behavioral change — all existing users importing remote will get DeprecationWarning. This is intentional but should be documented in release notes.

Observations & Issues

1. Dual-purpose methods create subtle API surface
The .get()/.post()/.put()/.delete()/.patch() methods return either a decorator (no data arg, non-client mode) or a coroutine (client mode). This is determined by self.is_client. While tested, this design could confuse users:

ep = Endpoint(name="my-api")
ep.post("/compute")          # returns a decorator
ep_client = Endpoint(id="x")
ep_client.post("/compute")   # returns a coroutine

The distinction is tested but the boundary between "no data arg = decorator" vs "data=None = client call" is not explicitly tested. A user calling ep.post("/compute", None) in decorator mode would get a coroutine instead of a decorator.

2. _is_live_provisioning() default heuristic
When FLASH_IS_LIVE_PROVISIONING is unset, the function defaults to live mode unless RUNPOD_ENDPOINT_ID or RUNPOD_POD_ID is set. This heuristic is reasonable but not tested — no test verifies the fallback behavior when the env var is missing.

3. Endpoint with name=None and routes
Endpoint(name=None, id=None) raises ValueError, but Endpoint() (no args) also triggers this. However, scanner edge case test uses Endpoint() without name= and it works because the test file calls Endpoint() which would raise at runtime. The scanner test at line that tests my_api = Endpoint() — this would raise ValueError("name or id is required") at import time during manifest extraction, though AST-only scanning avoids executing the code.

Test Quality Assessment

Strengths:

Full 2x2x2 resource config type matrix tested (qb/lb x gpu/cpu x live/deploy = 8 combinations)
Client mode end-to-end flows well covered (run, wait, cancel, timeout)
Edge cases: FastAPI @app.get() not falsely matched, unregistered variable routes ignored, nested directories, cross-call detection
Mixed legacy + Endpoint coexistence tested at scanner, discovery, and manifest levels
Assertion quality is good — specific field checks, not just len() assertions

Missing Coverage:

_is_live_provisioning() standalone tests — no test verifies the fallback heuristic when env var is unset
_normalize_gpu() / _normalize_cpu() error paths — invalid types (e.g., gpu="string") not tested
Endpoint.__call__ with invalid func — what happens if you @ep decorate a non-callable?
Client mode PUT/DELETE/PATCH calls — only GET and POST client calls tested in TestClientRequest; PUT, DELETE, PATCH use the same _client_request path but are not explicitly verified
EndpointJob.wait() backoff intervals — the exponential backoff logic (_POLL_INITIAL_INTERVAL, _POLL_BACKOFF_FACTOR, _POLL_MAX_INTERVAL) is not verified; tests only check correctness, not timing behavior
Thread safety of _cached_resource_config — no concurrent access test (low risk for typical usage)
Deprecation warning stacklevel — no test verifies the warning points to the caller's frame, not the internal frame

Suggested Improvements:

Add a parametrized test for _normalize_gpu and _normalize_cpu with invalid inputs
Add a test for _is_live_provisioning() with various env combinations (unset, "true", "false", RUNPOD_ENDPOINT_ID set)
Consider adding a PUT/DELETE/PATCH client call test for completeness (even if trivially same path)
The _mock_httpx_client helper is well-designed but could be moved to conftest for reuse

Review Comments Integration

The PR already addresses the 4 review items from @runpod-Henrik:

scaler_type/scaler_value — added in commit 3787b4b (params on Endpoint + manifest extraction + provisioner support)
PodTemplate — template= param added in same commit
Class-based @endpoint — confirmed supported, tested in TestEndpointQBClass and TestScanEndpointWorkers.test_endpoint_class_discovered_as_qb
GpuGroup vs GpuType — skeleton templates use GpuType.ANY, examples use GpuGroup — per author's preference

Recommendation

MERGE WITH NOTES

The PR is solid — 161 tests, CI green on all Python versions, comprehensive coverage of the new Endpoint API. The dual-purpose method design and _is_live_provisioning() heuristic warrant documentation but are not blockers. Two suggestions before merge:

Add release notes documenting the remote() deprecation warning (all existing code will emit warnings)
Consider adding 2-3 tests for the missing _is_live_provisioning() fallback and _normalize_gpu/_normalize_cpu error paths

Generated by flash-qa agent

Copilot

Pull request overview

This PR introduces a unified Endpoint facade as the single user-facing API for Flash endpoints, consolidating the previous @remote decorator + multiple resource config classes into one entrypoint while keeping the underlying provisioning/handler pipeline intact via internal resource-config unwrapping.

Changes:

Added runpod_flash.endpoint.Endpoint and EndpointJob to support QB (decorator) mode, LB (route registration) mode, and client mode (id= / image=).
Updated scanning/discovery/manifest generation and skeleton templates/docs to recognize and showcase the new Endpoint patterns.
Marked legacy @remote and legacy resource config classes as deprecated (warnings + compatibility import paths), and added extensive unit tests.

Reviewed changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit/test_skeleton_endpoint.py	Verifies skeleton templates use `Endpoint` (not `@remote` / legacy classes).
tests/unit/test_endpoint_client.py	Adds unit tests for client-mode calls and `EndpointJob` lifecycle methods.
tests/unit/test_endpoint.py	Adds broad unit coverage for `Endpoint` construction, mode inference, and internal config selection.
tests/unit/test_discovery_endpoint.py	Ensures `ResourceDiscovery` recognizes `Endpoint` LB patterns and resolves to deployable resources.
tests/unit/test_deprecations.py	Ensures deprecation warnings are emitted for legacy imports/usages.
tests/unit/runtime/test_resource_provisioner.py	Extends resource provisioner tests for manifest `resource_type="Endpoint"`.
tests/unit/cli/commands/test_run_endpoint.py	Tests worker scanning + server generation for Endpoint QB/LB patterns.
tests/unit/cli/commands/build_utils/test_scanner_endpoint.py	Adds comprehensive scanner tests for Endpoint QB/LB patterns and edge cases.
tests/unit/cli/commands/build_utils/test_manifest_endpoint.py	Tests manifest-building and deployment-config extraction for Endpoint resources.
src/runpod_flash/runtime/resource_provisioner.py	Adds manifest-time mapping from `Endpoint` to underlying resource classes and extracts scaler fields.
src/runpod_flash/endpoint.py	Implements `Endpoint` + `EndpointJob`, including internal resource-config selection and client methods.
src/runpod_flash/core/discovery.py	Extends discovery to detect `Endpoint` LB usage (`ep = Endpoint(...)` + `@ep.get/post/...`).
src/runpod_flash/client.py	Deprecates `remote()` (warning) and adds `_internal` flag for Endpoint internals.
src/runpod_flash/cli/utils/skeleton_template/lb_worker.py	Updates LB skeleton template to use `Endpoint` route registration.
src/runpod_flash/cli/utils/skeleton_template/gpu_worker.py	Updates GPU QB skeleton template to `@Endpoint(...)` and adds a small main test harness.
src/runpod_flash/cli/utils/skeleton_template/cpu_worker.py	Updates CPU QB skeleton template to `@Endpoint(...)` and adds a small main test harness.
src/runpod_flash/cli/utils/skeleton_template/README.md	Updates template README to document QB/LB/client usage via `Endpoint`.
src/runpod_flash/cli/commands/run.py	Updates CLI “no workers found” guidance to show `Endpoint` examples.
src/runpod_flash/cli/commands/build_utils/scanner.py	Adds Endpoint QB/LB AST detection and metadata emission.
src/runpod_flash/cli/commands/build_utils/manifest.py	Unwraps Endpoint when extracting deployment config + adds scaler fields.
src/runpod_flash/cli/commands/_run_server_helpers.py	Unwraps Endpoint in LB execution helper before provisioning.
src/runpod_flash/init.py	Exposes `Endpoint`/`EndpointJob` and adds deprecation warnings for legacy names.
docs/Using_Remote_With_LoadBalancer.md	Rewritten to describe LB endpoints via `Endpoint` rather than `@remote` + LB classes.
docs/Load_Balancer_Endpoints.md	Updates docs to position `Endpoint` as user-facing API, legacy classes as internal.
docs/LoadBalancer_Runtime_Architecture.md	Updates runtime docs terminology and examples to `Endpoint`.
docs/GPU_Provisioning.md	Updates examples to use `Endpoint` patterns and scaler params.
docs/Flash_SDK_Reference.md	Updates SDK reference to `Endpoint` as primary API + adds `EndpointJob` section.
docs/Flash_Deploy_Guide.md	Updates deployment guide terminology and diagrams to `Endpoint`.
docs/Deployment_Architecture.md	Updates architecture doc to reflect Endpoint-based scanning/manifest.
docs/Cross_Endpoint_Routing.md	Updates routing doc examples to use `Endpoint`.
.gitignore	Adds `/.pi` ignore entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/runpod_flash/endpoint.py

src/runpod_flash/__init__.py

src/runpod_flash/runtime/resource_provisioner.py

src/runpod_flash/endpoint.py

runpod-Henrik

Code Review — PR #223: Unified Endpoint Class

Nice refactor unifying 8 resource config classes + @remote into a single Endpoint facade. The API design is clean and the test coverage is solid (+2,688 lines of tests). Found a few issues:

Bug 1 (HIGH): `_normalize_workers` accepts negative and inverted values

endpoint.py — _normalize_workers()

def _normalize_workers(workers):
    if workers is None:
        return (0, 1)
    if isinstance(workers, int):
        return (0, workers)  # workers=-5 → (0, -5)
    if isinstance(workers, (tuple, list)) and len(workers) == 2:
        return (int(workers[0]), int(workers[1]))  # (-3, -1) accepted

Endpoint(name="x", workers=-5) → (0, -5) silently accepted
Endpoint(name="x", workers=(10, 2)) → min > max silently accepted
Endpoint(name="x", workers=(3.99, 7.01)) → silently truncated to (3, 7) by int()

Fix: Add validation:

min_w, max_w = ...
if min_w < 0 or max_w < 0:
    raise ValueError(f"workers cannot be negative: ({min_w}, {max_w})")
if min_w > max_w:
    raise ValueError(f"workers min ({min_w}) cannot exceed max ({max_w})")

Bug 2 (MEDIUM): No duplicate route detection in `_route()`

endpoint.py — _route()

def _route(self, method: str, path: str):
    # ...validation...
    def decorator(func):
        self._routes.append({"method": method, "path": path, ...})

Routes are appended to a list with no duplicate check. Two functions registered on @api.post("/predict") silently coexist — last one wins at runtime.

Fix: Check before appending:

existing = [(r["method"], r["path"]) for r in self._routes]
if (method, path) in existing:
    raise ValueError(f"duplicate route: {method} {path}")

Bug 3 (MEDIUM): No reserved path validation

endpoint.py — _route()

The docs explicitly list /execute and /ping as reserved paths, but _route() doesn't block them. @api.post("/execute") or @api.get("/ping") would collide with framework endpoints.

Fix: Add to _route():

_RESERVED_PATHS = frozenset({"/execute", "/ping"})
if path in _RESERVED_PATHS:
    raise ValueError(f"path {path} is reserved by the framework")

Bug 4 (MEDIUM): Scanner hardcodes `is_live_resource=True` for all Endpoint patterns

scanner.py — _build_endpoint_qb_metadata() and _build_endpoint_route_metadata()

Both methods hardcode is_live_resource=True. This means flash deploy (non-live provisioning) would still see is_live_resource=True, potentially using the wrong resource class (Live* instead of deploy-time *Endpoint).

The _build_resource_config() in endpoint.py correctly calls _is_live_provisioning() at runtime, but the scanner metadata is set statically at scan time.

Fix: Either defer the flag or set it dynamically:

is_live_resource=_is_live_provisioning(),  # or leave as None, let downstream decide

Bug 5 (MEDIUM): `get()`/`post()` silently ignore `data` in decorator mode

endpoint.py — get()

def get(self, path: str, data: Any = None, **kwargs):
    if self.is_client:
        return self._client_request("GET", path, data, **kwargs)
    return self._route("GET", path)  # data silently dropped

In decorator mode, data and **kwargs are silently ignored. A user writing @api.get("/health", data={"key": "val"}) gets no error — the data is silently dropped.

Fix: In decorator mode, validate no extra args:

if data is not None or kwargs:
    raise TypeError(
        "data and kwargs are only valid in client mode (Endpoint with id= or image=). "
        "In decorator mode, use @api.get('/path') with no data argument."
    )

Bug 6 (LOW): `EndpointJob.wait()` can overshoot timeout

endpoint.py — EndpointJob.wait()

while not self.done:
    if deadline is not None and time.monotonic() >= deadline:
        raise TimeoutError(...)
    await asyncio.sleep(interval)  # sleeps THEN checks status
    await self.status()  # network call, can take arbitrary time
    interval = min(interval * _POLL_BACKOFF_FACTOR, _POLL_MAX_INTERVAL)

The deadline check happens before asyncio.sleep(interval) + the network call to status(). With _POLL_MAX_INTERVAL=5.0, the actual wait can overshoot by 5s + network latency. Not critical since it's a best-effort timeout, but worth documenting or checking deadline after the sleep too.

Bug 7 (LOW): Discovery string match "Endpoint" too broad

discovery.py — content check

if "@remote" not in content and "Endpoint" not in content:
    continue  # skip file

This matches any file containing the substring "Endpoint" anywhere — comments, docstrings, variable names like api_endpoint_url. This is a pre-filter so false positives just mean extra AST parsing (performance, not correctness), but could be tightened.

Overall this is well-structured. The main concerns are Bug 1 (negative workers will propagate to the API and either error cryptically or create broken endpoints) and Bug 2/3 (route collisions are a common user error that should be caught early).

… args

into zeke/single-entrypoint

…rker env

…rrors

runpod-Henrik

Follow-up Review — PR #223

Nice work addressing the feedback from the first review. 5 of 7 original bugs are now fixed with tests. Here's where things stand:

Previously reported — now FIXED (thank you!)

✅ Bug 1: _normalize_workers negative/inverted validation (commit 2b7e838)
✅ Bug 2: Duplicate route detection (commit 2b7e838)
✅ Bug 3: Reserved path validation (commit 2b7e838)
✅ Bug 5: get()/post() data silently dropped in decorator mode (commit 2b7e838)
✅ _ClientCoroutine wrapper gives clear errors when using client endpoints as decorators (commit 7cf0cf8)
✅ R2 presigned URL auth header fix in upload_build() (commit c4cf791)

Still open from original review

Bug 4 (MEDIUM): Scanner hardcodes is_live_resource=True

_build_endpoint_qb_metadata(), _build_endpoint_route_metadata(), and _register_endpoint_variable() all hardcode is_live_resource=True. During flash deploy, the scanner metadata would claim live mode even when deploying. The runtime _build_resource_config() correctly calls _is_live_provisioning(), but the scanner metadata doesn't match.

Likely low blast radius since the manifest extraction unwraps Endpoint and calls _build_resource_config() which does the right thing — but the scanner metadata is misleading.

New finding

_ensure_endpoint_ready() caches URL for image= mode regardless of lb flag

# image= mode: use cached result
if self._endpoint_url is not None:
    return self._endpoint_url  # always returns first-cached format

For id= mode this is handled correctly (resolves fresh each time). But for image= mode, if the first call is QB-style (ep.run(...) → path URL), then an LB-style call (ep.get("/path") → subdomain URL) returns the wrong format from cache.

Fix: cache both formats, or re-derive from the deployed ID:

if self._endpoint_url is not None:
    if lb:
        return self._resolve_lb_url(self._deployed_id)
    return self._resolve_qb_url(self._deployed_id)

Overall

This is in great shape — the validation, error messages, and test coverage are significantly improved since the first review. The _ClientCoroutine wrapper is a particularly nice touch for catching the decorator-on-client-endpoint mistake. The scaler/template additions round out the feature set.

The is_live_resource hardcoding and URL caching bug are the only remaining concerns — neither is a blocker but the URL caching could cause confusing behavior for image= endpoints that make both QB and LB calls.

…rkers

…ve template

…plate

flash-singh0

looks good, bugs raised by copilot are low priority

KAJdev added 6 commits February 25, 2026 13:55

feat: add unified Endpoint class replacing 8 resource config classes

a6bea18

feat: wire Endpoint into scanner and flash run, add id= and client mode

c605303

feat: wire Endpoint into build pipeline and resource discovery

6be0140

feat: implement Endpoint client mode (run/runsync/status/HTTP methods)

021d512

feat: add EndpointJob with status()/wait()/cancel() and webhook support

b272e72

feat: deprecate legacy resource classes and @Remote, update skeleton …

dc375db

…templates

KAJdev requested a review from deanq February 25, 2026 23:15

KAJdev added 2 commits February 25, 2026 15:17

format

244c5d4

chore: fix lint errors and formatting

191ceb9

KAJdev marked this pull request as ready for review February 25, 2026 23:46

KAJdev added 4 commits February 25, 2026 16:01

fix: default to live provisioning when no explicit env signal is set

4c3ff83

fix: resolve Endpoint resource type in deploy provisioner

3be2f23

Merge branch 'main' into zeke/single-entrypoint

5eb7d57

Merge branch 'main' into zeke/single-entrypoint

335606e

KAJdev added 3 commits February 27, 2026 12:14

feat: add scaler_type, scaler_value, and template params to Endpoint

3787b4b

fix: suppress warnings from internal calls

ec3c7d7

Merge branch 'main' into zeke/single-entrypoint

a1c48eb

KAJdev mentioned this pull request Feb 27, 2026

refactor: Endpoint api migration runpod/flash-examples#36

Merged

5 tasks

KAJdev added 2 commits March 2, 2026 11:00

chore: update docs

f428071

Merge branch 'main' into zeke/single-entrypoint

2807254

deanq changed the title ~~feat: single entrypoint~~ feat: Endpoint class as a single entrypoint uniting @remote and ServerlessResource-based classes Mar 3, 2026

deanq changed the title ~~feat: Endpoint class as a single entrypoint uniting @remote and ServerlessResource-based classes~~ refactor: Endpoint class as a single entrypoint uniting @remote and ServerlessResource-based classes Mar 3, 2026

Merge branch 'main' into zeke/single-entrypoint

349e235

deanq requested a review from Copilot March 3, 2026 07:23

Copilot started reviewing on behalf of deanq March 3, 2026 07:24 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

runpod-Henrik reviewed Mar 4, 2026

View reviewed changes

KAJdev added 13 commits March 3, 2026 17:13

fix: add input validation for Endpoint workers, routes, and decorator…

2b7e838

… args

Merge branch 'main' into zeke/single-entrypoint

3adabeb

Merge branch 'zeke/single-entrypoint' of https://github.com/runpod/flash

efa2412

into zeke/single-entrypoint

fix: default LB endpoints to REQUEST_COUNT scaler type

2382fcd

fix: tolerate re-imported GpuType/GpuGroup enums in _normalize_gpu

19e1e56

fix: detect cross-endpoint calls inside class methods

9a6ad0b

fix: strip Authorization header from R2 presigned URL uploads

c4cf791

fix: format endpoint.py and remove unused import

9f1318a

fix: update skeleton template tests to assert Endpoint class

9bf315c

fix: mark login tests as serial to prevent parallel interference

2dd59b9

fix: use LB subdomain URLs for client requests and handle deployed wo…

9c5732f

…rker env

fix: reject Endpoint(id=) and Endpoint(image=) as decorators

f29de9d

fix: wrap client HTTP calls in _ClientCoroutine for clear decorator e…

7cf0cf8

…rrors

runpod-Henrik reviewed Mar 4, 2026

View reviewed changes

KAJdev added 5 commits March 4, 2026 11:01

fix: generate class-aware deployed handler for class-based @Remote wo…

7e6328e

…rkers

fix: always generate deployed handlers in flash build, remove dead li…

07e313b

…ve template

fix: update build pipeline integration tests for deployed handler tem…

90ce668

…plate

Merge branch 'main' into zeke/single-entrypoint

02e758f

docs: rewrite all docs and README for Endpoint class API

d5126ef

flash-singh0 self-requested a review March 5, 2026 00:45

flash-singh0 approved these changes Mar 5, 2026

View reviewed changes

KAJdev merged commit 5c3f3a6 into main Mar 5, 2026
6 checks passed

KAJdev deleted the zeke/single-entrypoint branch March 5, 2026 00:48

runpod-release-please-bot bot mentioned this pull request Mar 5, 2026

chore: release 1.6.1 #244

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Endpoint class as a single entrypoint uniting `@remote` and ServerlessResource-based classes#223

refactor: Endpoint class as a single entrypoint uniting `@remote` and ServerlessResource-based classes#223
KAJdev merged 36 commits intomainfrom
zeke/single-entrypoint

KAJdev commented Feb 25, 2026 •

edited

Loading

Uh oh!

runpod-Henrik commented Feb 26, 2026

Uh oh!

KAJdev commented Feb 26, 2026

Uh oh!

runpod-Henrik commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Uh oh!

runpod-Henrik left a comment

Uh oh!

flash-singh0 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

KAJdev commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unified Endpoint API

Queue-based

Load-balanced

Client mode

What changed

Uh oh!

runpod-Henrik commented Feb 26, 2026

What we'd want:

Uh oh!

KAJdev commented Feb 26, 2026

Uh oh!

runpod-Henrik commented Feb 27, 2026

QA Report

CI Status

PR Scope

Test File Summary

PR Diff Analysis

Observations & Issues

Test Quality Assessment

Review Comments Integration

Recommendation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

Code Review — PR #223: Unified Endpoint Class

Bug 1 (HIGH): _normalize_workers accepts negative and inverted values

Bug 2 (MEDIUM): No duplicate route detection in _route()

Bug 3 (MEDIUM): No reserved path validation

Bug 4 (MEDIUM): Scanner hardcodes is_live_resource=True for all Endpoint patterns

Bug 5 (MEDIUM): get()/post() silently ignore data in decorator mode

Bug 6 (LOW): EndpointJob.wait() can overshoot timeout

Bug 7 (LOW): Discovery string match "Endpoint" too broad

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

Follow-up Review — PR #223

Previously reported — now FIXED (thank you!)

Still open from original review

New finding

Overall

Uh oh!

flash-singh0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

KAJdev commented Feb 25, 2026 •

edited

Loading

Bug 1 (HIGH): `_normalize_workers` accepts negative and inverted values

Bug 2 (MEDIUM): No duplicate route detection in `_route()`

Bug 4 (MEDIUM): Scanner hardcodes `is_live_resource=True` for all Endpoint patterns

Bug 5 (MEDIUM): `get()`/`post()` silently ignore `data` in decorator mode

Bug 6 (LOW): `EndpointJob.wait()` can overshoot timeout